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—— Abstract 

An introduction to orbit-finite sets, which are a type of sets that are infinite enough to describe 
interesting examples, and finite enough to have algorithms running on them. The notion of orbit- 
finiteness is illustrated on the example of register automata, an automaton model dealing with 
infinite alphabets. 
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‘1 Introduction 


An orbit-finite set is a set that is constructed using some infinite logical structure, such as 
(N, =) or (Q, <), and which is finite up to automorphisms of that structure. For example, if 
the structure is (N,=), then 


{X :X CNand |X| <3} 


is an orbit-finite set, because automorphisms of the structure (i.e. permutations of N) can be 
used to map any subset X C N to any other subset of same cardinality, and therefore the set 
has four elements (cardinalities 0, 1, 2, 3) up to automorphisms. 

The goal of this paper is to give a gentle introduction to orbit-finite sets, in particular 
to explain how they can be represented and manipulated using algorithms. As a running 
example we use register automata over infinite alphabets. A more detailed description can 
be found in the lecture notes [5]. 


[Z The running example: register automata 


As our running example for describing orbit-finite sets, we consider register automata over 
data words, and their associated decision problems such as emptiness or minimisation. 

Typically, formal language theory uses finite alphabets. Here is an example which uses 
an infinite alphabet. 


> Running Example. Let A be some infinite set. As our running example, consider the 
language 


{w € A* : at most two distinct letters appear in w}. (1) 
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second register is empty 


a a, b a 
| ©) b L ©) first register stores atom a 
a(?,?) ——> aa,”) > g(a,b) ——> (?,?) 
| | | control is q 


qla, ?) 


™ Figure 1 There are two possible values for the control state, q and L. A configuration of the 
automaton consists of a value for the control plus a valuation of the registers. Dangling arrows 
indicate initial and accepting configurations. The variables a,b,c range over distinct atoms, i.e. to 
get the automaton one should instantiate the picture for every triple (a,b,c) € A? of distinct atoms. 


In our running example, the letters are only compared with respect to equality. Words 
as in the example are called languages of data words. The most common type of alphabet 
for data words is of the form © x A, where © is a finite set and A is an infinite set whose 
elements can only be compared for equality. We will use the name atoms for A; to underline 
that they have no structure except for equality. Automata that process data words (and more 
complicated objects, such as data trees) are a popular model in database theory (e.g. an XML 
document is conveniently described as a type of data tree) or in the theory of verification. 
See the survey [25] for more on such automata. 

One of the most basic automata models for data words is register automata (introduced 
in [17] under the name of finite memory automata). This is a type of automaton which uses 
finitely many registers to store some of the atoms that have been seen so far. We will use 
register automata as our running example of orbit-finite sets. 


> Running Example. This language in the running example, namely “at most two letters 
(atoms) appear” is recognised by a register automaton with two registers. The registers are 
used to store the at most two distinct atoms in the input. Once the two registers are filled up 
and a fresh third atom appears, the automaton enters a rejecting sink state. The automaton 
is shown in Figure 1. 


In general, the space of configurations of a register automaton is defined by giving a finite 
set of control states and a set of register names. A configuration is then a pair: (control 
state, partial function from register names to atoms). For the precise syntax of the transition 
relation (and notions of initial and final configurations), we refer to [17]; for the purposes of 
this paper it suffices to say that the syntax is designed so that transitions depend on the 
atoms in a way which only uses equality. Register automata, and especially deterministic 
register automata, are one of the simplest automaton models for data words, e.g. they are 
not expressive enough to recognise the language “all input letters are different”. For more 
expressive models, see [25]. 


[37 Mild extensions of register automata 


To motivate the introduction of orbit-finite and definable sets, which are the topic of this 
paper, we present three mild requirements for extending the model of register automata. 
We will then show that these requirements can be met by automata with definable (or 
equivalently, orbit-finite) descriptions. 
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3.1 Minimisation 


One natural thing to do with (deterministic) register automata is to minimise them. As we 
will see, the register mechanism is not well suited to this task. 


> Running Example. Consider the automaton from Figure 1, which uses two registers to 
recognise words with at most two distinct atoms. This automaton has different configurations 
after reading inputs ab and ba, because its configuration implicitly remembers which letter 
was read first. A minimal automaton, on the other hand, should have the same configuration 
after reading these two inputs, because the language is commutative. In a minimal automaton, 
the set of reachable configurations should be something like: 


{e, L} U {a:a € A} U {{a,b}:aAbeE A} 
Sa’ A < ma 


initial state and rejecting sink words that have exactly one atom word that have exactly two atoms 
Such a configuration set cannot be achieved with the register mechanism. 


A workaround for the problems described above would be to allow registers which store 
unordered sets of atoms. Here is another example language, where the workaround with 
unordered sets of atoms fails. 


> Example 1. Consider the following language 
{wv : w,v € A? are equal up to cyclic shift}. (2) 


For this language, the set of configurations of a minimal automaton reachable after reading 
three letters should be 


{ {abc, bca,cab} : a,b,c € A}. 
1—— 


equivalence class of a 
three letter word up 
to cyclic shifts 


Example 1 and the running example essentially exhaust the possible problems with 
miminisation: to minimise register automata it suffices to consider a model where each 
control state has a varying number of registers, and there might be some group acting on 
the registers (e.g. so that the registers form an unordered set, or can be shifted cyclically, 
etc.), see [8, Section 6]. The idea to use groups acting on registers dates back to [23]. 

Why is it so important to minimise automata? A minimal automaton is not just small — 
with the obvious efficiency advantages — but more importantly it is a canonical representation 
of its recognised language. 

One use for canonical automata is that for the correctness of certain algorithms, it is 
useful to assume that the input is a canonical automaton. An example is learning algorithms, 
whose running time bounds and correctness proofs are based on minimal automata, see 
e.g. [20] for a variant of the Angluin algorithm for register automata. Another example is 
algorithms which input an automaton and decide if its recognised language is definable in 
some logic, see [2, 6]; here non-definability is typically equivalent to some kind of forbidden 
pattern in the canonical automaton. 

Another use for canonical automata is that they can be useful when trying to better 
understand “regularity” for languages over infinite an alphabet. There is a multitude of 
automata models for infinite alphabets, see [22, 25], most of them with different expressive 
powers, and one naturally asks [4]: which model defines the “regular languages”? Over finite 
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alphabets, the Myhill-Nerode theorem gives a convincing machine independent characterisa- 
tion of regular languages in terms of minimisation: a language is regular if and only if it has 
finitely many equivalence classes in their syntactic congruence (and these equivalence classes 
form the states of the canonical minimal automaton). Therefore, a natural idea is to study 
the syntactic congruences of languages such as the one in the running example, and try to 
find devices which store the information from the syntactic congruence and nothing else. This 
idea was pursued for monoids (corresponding to a two-sided syntactic congruence) in [6], for 
deterministic automata (corresponding to a one-sided syntactic congruence, which is different 
from the two-side one in the presence of infinite alphabets) in [8], and for deterministic timed 
automata in [3, 10]. In all of these cases the straightforward register mechanism (without 
group actions and other features) is insufficient to allow minimisation. See also [21] for 
algorithmic aspects of minimising register automata. 


3.2 More general input alphabets 


In data words and register automata, the input alphabet is typically assumed to be of the 
form X x A for some finite set ©. Why not allow slightly more general input alphabets, such 
as pairs of atoms A? or unordered pairs of atoms CG) ? At first sight this seems like a needless 
generalisation, but it turns out that some interesting theoretical issues appear only for the 
more general alphabets. As an example [18, Example 2.5], which admittedly goes far beyond 
register automata because it uses Turing machines, consider the following set: 


{{{a, b,c}, {d,e, f}} : a,b,c, d,e, f E A are pairwise different}. 
Suppose that the input alphabet X is the above set, and consider the language: 
{wv : w,v € &* are such that 7(w) = v for some permutation of the atoms 7}. 


In [18] it is shown that the above language witnesses that deterministic and nondeterministic 
Turing machines (in a suitable generalisation for infinite alphabets using atoms) have different 
expressive powers, and simpler alphabets (e.g. alphabets which talk about less than six 
atoms) do not witness this. This result builds on [9], which in turn builds on the seminal 
Cai-Fiirer-Immermann [27] construction. The readers familiar with [27] will recognise the 
use of six atoms in the alphabet X. 


3.3 Atoms with more structure than just equality 


In our definition of data words and register automata, the atoms A had equality as the 
only available structure. Why not allow for more structure? Data words with additional 
structure, such as a total order, have long been present in the literature on data words. 
For example, [13] shows that emptiness is decidable for register automata where the input 
alphabet is (N, <) and the register operations can compare letters with respect to the order. 
Another important example is timed automata [1], which can be viewed as a special kind 
of register automata where the atoms are (Q,<,+1), see [10] and [14, 15] for the case of 
pushdown automata. Other examples come from modelling programs interacting with a 
database, see e.g. [26, 11]; in these applications the atoms might have e.g. an arbitrary graph 
structure. 


l} Definable sets 


In Section 2 we described register automata, and in Section 3 we discussed three requirements 
for a more general model: it should minimise (Section 3.1); it should allow more general 
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input alphabets than only the atoms (Section 3.2); and it should allow more structure on 
the atoms than just equality (Section 3.3). In this section we present such a model, using 
the notion of definable sets. The general idea is that definable sets are those that can be 
constructed using set-builder notation. Before giving the precise definition, consider some 
examples. 


> Example 2. Here are some of the sets that we have seen so far: 
atoms A 
ordered pairs of atoms A? 
unordered pairs of atoms {{a, b} : a,b E A} 


states in the minimal automaton 
L : A : bEA 
from the running example Lene Ce ae he Ry 


triples of atoms modulo cyclic shift {{abc, bca, cab} : a,b,c € A} 


input alphabet for a language which {{{a, b,c}, {d,e, f} : 


witnesses that Turing machines van a,b, ¢,d,¢, f € A are distinct} 
atoms do not determinise 


The above examples used only equality. In the spirit of Section 3.3, let us consider some 
examples which assume that the atoms are equipped with structure other than equality. 


> Example 3. Assume that the atoms are equipped with a total order, and assume that 5 is 
one of the atoms. Here are some examples of sets defined using set builder notation: 


atoms smaller than 5 {a:a € A with a < 5} 
all closed intervals {{c: ce A witha<c<b}:a,b€A with a < b} 


We now give a more formal description of set-builder notation and sets defined by it. A 
parameter is an underlying logical structure A, i.e. a universe equipped with some relations 
and functions (equality is for free), which will be referred to as the atoms. 


> Example 4. Here are some examples of logical structures that we will use as atoms: 


A = (N, =) A = (Q, <) A = (N, +) A = (N, +, x) 
es _--—“—- SES ee n—“_J_ -_“__’ 
equality only ordered rationals Presburger arithmetic arithmetic 


In the first structure, we write the equality symbol (which is implicitly assumed to be always 
available) just to underline that it is the only structure available. 


Given a logical structure A, we define set-builder expressions over A by induction as follows. 
Fix some infinite set of variables {a,b,c,...}, which will be used in the set-builder expressions, 
and which are intended to range over atoms, i.e. elements of the universe of A. Subexpressions 
in set-builder expression can have free variables, but in the end we are interested in an 
outermost expression with no free variables. There are four constructions: each element of A is 
an expression (constant expression); each variable is an expression (variable expression); and 
expressions can be combined using binary union and set comprehension. These constructions 
are illustrated in Figure 2. 
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set comprehension expression, with no free variables 
set comprehension expression, with free variable a 
bound variables in set comprehension 


guard of set comprehension, can use quantifiers 
{{a} U {b} : a,b E A with a #b Ab Æ 3} U {5} 


constant used in guard i 5 P 
set comprehension expression 


constant expression with no bound variables 


union expression 


variable expression 
E Figure 2 A set-builder expression over A = (N, =). 


The semantics of a set-builder expression is a function which inputs a valuation of its 
free variables (i.e. a function from the free variables to the universe of A) and outputs an 
atom, a set, a set of sets, etc. defined in the obvious way. A definable set over A is the 
semantics of a set-builder expression without free variables. The reader will readily see that 
all sets mentioned in Example 2 are definable regardless of the choice of A (see the running 
example below for an explanation of how pairing and elements such as L are encoded) and 
the sets mentioned in Example 3 are definable as long as the vocabulary of A contains a 
binary relation < and the universe contains an element 5. 

Our proposed solution to the requirements raised in Section 3 is to consider automata 
where all components (the input alphabet, the states, the transition relation, the initial and 
accepting states) are definable over some structure A. Call these definable automata over 
some structure A [8, 11]. 


> Running Example. Assume that the atoms are A = (N,=), i.e. some countably infinite 
set with equality only. Here is a deterministic automaton which recognises the language from 
our running example, i.e. words in A* with at most two distinct letters. The input alphabet is 
A, which is clearly a definable set, as defined by the set-builder expression {a:a € A}. The 
set of states Q is 


10, L}Uf{a:ae A} U {{a,b} :aA#A DEA}, 
which is also definable as long as we assume that L is syntactic sugar for some definable set 
like {0}. The initial state is 0, which is clearly definable, and all states states are final except 
L. Here is the set of transitions, which happens to be a total function of type Q x A > Q as 
required in a deterministic automaton: 

{(0,a, {a}) :a € AJU 

{Qa}, b, {a, b}) : a,b E A}U 

{({a, b}, a, {a,b}) : a,b € A}U 

{({a,b},c, L) :a,b,c E A witha £b^a #c^b#c}U 

{(L,a, L):a €A} 
The above description uses ordered triples, which are formally not part of the syntax of set- 


builder expressions. However, triples and other tuples can encoded in sets using Kuratowski 
pairing: 


(x,y) = {x,{x,y}} (x,y,z) E (2,9), 2). 
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Under the above encoding of triples as sets, it is clear that the transition relation defined 
above is an example of a definable set. 


Based on the above example, it is easy to see that definable automata generalise register 
automata. What is more, the added flexibility of definable automata is sufficient to satisfy 
the requirements mentioned in Section 3, in particular minimisation, i.e. for every determin- 
istic definable automaton there exists a unique (up to isomorphism of automata) minimal 
deterministic definable automaton which recognises the same language [8, Theorem 3.8]. 
(This minimisation requires an additional assumption on the atoms, called oligomorphism, 
which will be discussed in Section 6.) 

Without further restrictions on the atoms A, definable automata are too general to be 
useful, as shown in the following example. 


> Example 5. Assume that the atoms are Presburger arithmetic A = (N,+). Consider an 
automaton where the input alphabet has one letter only, say the input alphabet is {0}, and 
the set of states is the definable set of all atoms (i.e. natural numbers). Assume that there is 
only one final state, namely the natural number 0. Since there is only one input letter, the 
transition function can be viewed as a function A — A. As the transition function, consider 
the function 


a/2 if a is even 
at> 
3a+1 otherwise 


which is a definable set as witnessed by the following set-builder expression for its graph: 


{(a,b):a,b€ Awitha=b+b} U 
{(a,b) :a,b E€ A with ~(3c a = c+ c)^b=a+a+a+1}. 


The transition function is based on the famous Collatz conjecture, which in the terminology 
of this example says: for every choice of initial state, at least one input word is accepted. In 
fact, all decision problems, such as emptiness or equivalence, are going to be undecidable 
for this particular choice of atoms (Presburger arithmetic, or even (N, <) or (N,+1)), which 
follows from the fact that Minsky machines are a special case of definable automata. 


> Example 6. Assume that the atoms are arithmetic A = (N,+, x), and consider a set-builder 
expression of the form 


{a:a € A with y(a)}. 


The above set is empty if and only if y(a) is unsatisfiable. Since satisfiability in arithmetic 
is undecidable, it follows that one cannot even decide if a set-builder expression describes 
an empty set. (This is in contrast with Presburger arithmetic from Example 5, where at 
least emptiness for set-builder expressions is definable, by virtue of Presburger arithmetic 
having a decidable theory, see [19, Proposition 2].) Other problems, such as nonemptiness 
or equivalence for automata are clearly also going to be undecidable when the atoms are 
arithmetic. 


[5 Graph reachability for definable sets 


In the previous section, we introduced definable sets, and discussed definable automata, 
i.e. automata where all components are definable sets. In Examples 5 and 6 we argued 
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input V, E, s, t given by set-builder expressions 


R := {s} 
repeat 
for v in R do 
for w in V do 
if {v,w} in E then 
R += {w} 
until R does not grow 


if t in R then 
print "reachable" 
else 

print "unreachable" 


© Figure 3 A naive algorithm for reachability. 


that emptiness for automata is undecidable when the atoms are (N, +) or (N, +, x). In this 
section, we discuss algorithmic problems like emptiness of automata in more detail. Instead 
of automata emptiness, we will discuss the essentially equivalent but more fundamental 
problem of graph reachability. We formalise the problem and present a condition on the atom 
structure A which guarantees the graph reachability problem is decidable. This condition is 
going to be violated for atoms like (N, <), (N, +) and (N, +, x) but it is going to be satisfied 
for atoms like (N, =) or (Q, <). 


The graph reachability problem 


For a logical structure A, define reachability for definable graphs over A to be the following 
decision problem: 

= Input. A graph (V, E) where V, E are definable sets and two vertices s,t € V. 

= Question. Is there a path from s to t? 


In the decision problem above, the inputs are represented by set-builder expressions. This 
representation assumes that there is some way of representing the universe of A, which is the 
case for all atom structures we have discussed so far, where the universe consists of natural or 
rational numbers. We are mainly interested in decidability and not in the precise complexity 
of the decision problem. 


> Example 7. Assume that the atoms A are (N,+). A valid input for the reachability 
problem for definable graphs over A is 


V=N E={(a,b):a,b€Awitha=b+bVa+1=b)} s=7 t=2. 


For this particular input the answer is “yes” because one can go from 7 to 2 by doing 
several steps of the form “divide by two or add one”. For the same reason as discussed in 
Example 5, i.e. encoding Minsky machines, reachability is undecidable for definable graphs 
over Presburger arithmetic (N,+). Actually, the undecidability holds already for atoms 
(N, <), since Minsky machines use only increments and decrements on the counters, and 
these can be defined in first-order logic using only the order. Note that to get undecidability 
for (N, <) it is important that formulas in the guards of definable sets are allowed to use 
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quantifiers. If only quantifier-free formulas would be allowed in the guards, then graph 
reachability for (N, <) would be decidable [13]. 


Example 7 shows that the reachability problem is undecidable when the atoms are natural 
numbers with order, or any other richer structure such as addition or multiplication. What 
about definable graphs over atoms with equality only, or atoms such as (Q, <)? It turns out 
that for these atoms, reachability is decidable, and the algorithm is quite straightforward. 
Define R,, to be the vertices which can be reached from the source by path of at most n edges. 
A naive algorithm to solve graph reachability (see Figure 3) would be to simply compute 
the sequence Ro C Ry C Rə C --- until it stabilises, and then test if the target vertex is in 
the stable set. Here is a key property of the program in Figure 3. Assuming that the atoms 
have decidable first-order theory (which assumption is true Presburger arithmetic (N, +) but 
not for general arithmetic (N, +, x)), then at least each step (both for loops) of the naive 
algorithm can be carried out in finite time, but the number of steps (repeat loop) might be 
unbounded: 


need not terminate ESSE will always terminate, assuming 


for v in R do A has decidable first-order theory 
for w in V do 
if {v,w} in E then 
Reel 
until R does not grow 


A more formal statement is in the following lemma. 


> Lemma 8. Let A be a logical structure. Suppose that EC V x V and RC V are given by 
set-builder expressions over A. Then one can compute a set-builder expression representing 


RU{veEV: {v,w} € E for some w € R}. 


Assume furthermore that A has decidable first-order theory. Then one can decide, given R' 
and R represented using set-builder expressions, if R' C R. 


The above lemma can be shown using [19, Proposition 2]; but it is mainly interesting as 
part of a more general framework, namely programming languages that deal with definable sets. 
There are currently two programming languages: a functional one [7], with an implementation 
as a Haskell library [20]; and an imperative one [12] with an implementation as a C++ 
library [19]. The example reachability program in Figure 3 is based on the imperative 
approach. The original version of the imperative programming language [12] assumed that 
the atoms were oligomorphic (see below), but the version from [19] relaxed this assumption to 
having a decidable first-order theory (which captures additional examples such as Presburger 
arithmetic). The semantics of both languages are designed so that one does not need to 
prove results like Lemma 8 by hand; but one can simply use more general principles like 
the following: every program without repeat can be simulated in finite time, assuming the 
inputs (i.e. program state before) and outputs (i.e. program state after) are represented using 
set-builder expressions. 

As follows from Example 7, decidability of the first-order theory of A alone does not 
guarantee that the repeat loop in the program from Firgure 3 will terminate in finitely 
many steps. Remarkably, when the atoms have equality only, or they are (Q, <), then the 
repeat loop necessarily terminates. The reason is that atoms with equality only or (Q, <) 
are examples of oligomorphic structures. In the next two sections, we discuss oligomorphic 
structures and prove this termination (Theorem 13). The assumption that the atoms are 
oligomorphic is what makes the theory of definable sets robustly well behaved. 
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‘6 = Oligomorphism and orbit-finiteness 


In this section we describe what it means for a logical structure to be oligomorphic. The 
main use of this assumption is that it allows us to use relaxed notion of finiteness for sets, 
called orbit-finiteness; in particular all definable sets will turn out to be orbit-finite. 


Definition of oligomorphism 


If A is a logical structure, then an automorphism is defined to be any permutation of 
its universe which is consistent with all the relations and functions in the vocabulary of 
A. For example, when A has only equality in its signature then an automorphism is any 
permutation; while if A is (Q, <) then an automorphism is any order-preserving permutation. 
The structures (N, <) and (N, +) have no automorphisms, while (Z, <) has only translations 
as automorphisms. 


> Definition 9 (Oligomorphism). Consider a logical structure A. We say that two tuples 
a,b € AÝ are in the same orbit if there exists an automorphism of A which takes @ to b 
componentwise. The structure A is called oligomorphic if for every k, the “same orbit” 
equivalence relation on A* has finitely many equivalence classes. 


The notion of oligomorphism made its appearance in model theory in 1959, thanks to a 
theorem proved independently by Engeler, Ryll-Nardzewski and Svenonius: for a countable 
structure, being oligomorphic is equivalent to having an w-categorical theory, see [16, Theorem 
7.3.1]. One important corollary of this theorem (and its proof) is that in an oligomorphic 
structure, every orbit of A} can be defined by a formula of first-order logic with k free 
variables; we will use this property later on. 


> Example 10. Here are some examples and non-examples of oligormophic structures. 

m= Assume that A = (N,<). This structure has no automorphisms, and therefore the 
“same orbit” equivalence relation on A has infinitely many equivalence classes (which are 
singletons). Therefore A is not oligomorphic. 

= Assume that A = (Z,<). The automorphisms are translations, and hence the “same 
orbit” equivalence relation has one equivalence class on A. However, there are infinitely 
many equivalence classes for A?, because 


(a1, az), (bı, b2) are in the same orbit iff a, — Q2 = by — bo. 


Therefore A is not oligomorphic. 

= Assume that A = (N, =), ie. a countably infinite set with equality only. For every k, 
tuples in A” are in the same orbit if and only if they have the same equality types, and 
there are finitely many equality types. Therefore, A is oligomorphic. 


= Assume that A = (Q, <). It is not difficult to see that for every k, tuples a, b satisfy a ~ b 
if and only if they have the same order types, and there are finitely many order types. 
Therefore, A is oligomorphic. 

m Every structure over a finite vocabulary without functions that is homogeneous is oligo- 
morphic. This covers Fraissé limits of classes of finite relational structures, such as the 
previous two items, or the countable random graph. For more on homogeneous structures, 
the Fraissé limit and the random graph, see [16, Section 7]. 
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Orbit-finiteness 


Oligomorphic structures turn out to be exceptionally well-behaved with respect to definable 
sets. The reason for this is that in an oligomorphic structure one can distinguish a relaxed 
notion of finiteness, called orbit-finiteness, which is well behaved and can be used to prove 
results such as the termination of the algorithm Figure 3. 

To define orbit-finiteness, we need to introduce a little set terminology. Fix a logical 
structure A. Define the cumulative hierarchy over A to be objects that are built using atoms 
(i.e. elements of A) and set brackets in a well-founded way. More precisely, for an ordinal 
number we define the rank a objects of the the cumulative hierarchy as follows: for a = 0 
the rank a@ objects are the empty set and every atom; for a > 0 the rank a objects sets 
whose elements are objects of rank < a. The cumulative hierarchy is the union of all ranks. 
For example, every definable set over A is in the cumulative hierarchy, but there are many 
more sets in the cumulative hierarchy (e.g. the cumulative hierarchy is closed under taking 
arbitrary subsets, unlike definable sets). 

If 7 is an automorphism of A (actually, any function of type A > A), then one can apply 
m to an object x in the cumulative hierarchy, by simply applying to the atoms that are used 
in x; the result is a new object m(x) in the cumulative hierarchy with the same rank. 


> Definition 11 (Finite support and orbit-finiteness). Let A be a logical structure. For x in 
the cumulative hierarchy over A, we say that x is finitely supported if there exists a tuple of 
atoms @ (which is called a support of x) such that 


m(@) =o0(@) implies z(y) =a(y) for every automorphisms 7, of A. 


We say that x is orbit-finite if the following equivalence relation on elements of x has finitely 
many equivalence classes: 


y~z ify=7(z) for some automorphism 7 of A. 


The notion of finite supports is fundamental to set theories such as Fraenkel-Mostowski, 
and more recently to the theory of nominal sets [24]. To the author’s best knowledge, the 
notion of orbit-finiteness was first explicitly proposed in [6]. In the terminology of the above 
definition, A is oligomorphic if and only if A” is orbit-finite for every k. The following 
theorem shows that, under the assumption that the atoms are oligomorphic, then the notion 
of orbit-finiteness is well behaved and definable sets are the same as sets which are hereditarily 
orbit-finite. For a proof of the following theorem, see [5]. 


> Theorem 12. Let A be a logical structure which is oligomorphic, and let x be in the 

cumulative hierarchy over A. 

1. x is orbit-finite if and only if for every tuple of atoms a, the following equivalence relation 
has finitely many equivalence classes: 


y~az tfy=n(z) for some automorphism n of A which satisfies 7(G@) = a. 


2. x is definable if and only if it is hereditarily orbit-finite (i.e. x finitely supported and 
orbit-finite, and both these properties also hold for elements of x, their elements, and so 
on recursively). 


The equivalence in item 2 of the above theorem is very useful for computation: in some 
cases it is more convenient to use definable sets (e.g. to represent sets in a finite way), and in 
some cases it is more convenient to use orbit-finite sets (e.g. in termination proofs). We now 
show an example of this usefulness. 
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Termination of the reachability algorithm 


In Figure 3 from Section 5, we presented a naive reachability algorithm for definable graphs. 
We also remarked that, as long as the atoms have decidable first-order theory, then each 
iteration of the repeat loop could be evaluated in finite time. We are now ready to show that, 
as long as the atoms are oligomorphic, then the repeat loop will be be performed finitely 
often. 


> Theorem 13. Assume that A is oligomorphic. Then the reachability algorithm in Figure 3 
terminates on every input. 


Proof. Assume that the input to is a graph with distinguished source and target, and that 
each part of the input (vertices, edges, source and target) is a definable set represented by a 
set-builder expression. By item 2 in Theorem 12, each part of the input has a finite support. 
By combining these finite supports into a single tuple, we can conclude there is some tuple 
of atoms a which supports all parts of the input, i.e. a supports the vertices, the edges and 
the source and target vertices. For n € {0,1,...} define R, to be the vertices which are 
reachable from the source vertex by a path with at most n edges. Recall the equivalence 
relation ~a mentioned in item 1 of Theorem 12. Using induction on n and the assumption 
that a supports the source vertex and the edges, we get the following observation: 


(*) each set Rn is a union of equivalence classes of the ~a. 


By item 1 of Theorem 12, there are finitely many equivalence classes. By (*) each step of 
the reachability algorithm adds some new equivalence classes, and therefore the algorithm 
must terminate in a finite number of steps. < 


The goal of the above proof was to illustrate the interplay between definability (as a 
way of representing infinite inputs) and orbit-finiteness (as a way of proving termination 
for algorithms). Other examples of this interplay include algorithms which uses a fixpoint 
computation, such as the standard algorithm for emptiness of a context-free grammar (which 
works if the grammar is definable over oligomorphic atoms) or the Moore minimisation 
algorithm for deterministic automata (which works for definable automata over oligomorphic 
atoms). 
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